Skip to content

Rebuild upsert metadata from validDocIds snapshot on reload for TTL tables#18860

Draft
deepthi912 wants to merge 1 commit into
apache:masterfrom
deepthi912:block-force-download-ttl-upsert-dedup
Draft

Rebuild upsert metadata from validDocIds snapshot on reload for TTL tables#18860
deepthi912 wants to merge 1 commit into
apache:masterfrom
deepthi912:block-force-download-ttl-upsert-dedup

Conversation

@deepthi912

@deepthi912 deepthi912 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Reloading a segment of an upsert table with metadata TTL (metadataTTL or
deletedKeysTTL) previously rescanned every doc and re-added every primary key,
which resurrected keys the TTL had already expired or deleted — producing
incorrect query results.

With snapshots enabled, segment reload now rebuilds the upsert metadata from the
persisted validDocIds snapshot (valid docs only).

Behavior (upsert + metadata TTL, snapshots enabled)

Reload snapshot present no snapshot
normal CRC match → rebuild from snapshot; CRC mismatch → fail fail (use forceDownload)
forceDownload rebuild from snapshot (even on CRC mismatch — operator-accepted) full scan

Segment commit and refresh paths are unchanged — only the reload path
places a snapshot on the incoming segment, so they keep the regular full-scan
behavior.

Notes

  • Supersedes the earlier approach of blocking forceDownload reload at the
    controller for upsert/dedup TTL tables; that API-level check is removed.
  • Dedup tables are out of scope (no validDocIds snapshot).

Backward-incompatible

A normal (non-forceDownload) reload of an upsert table with metadata TTL and
snapshots enabled now fails when the segment CRC has changed or no
validDocIds snapshot exists; use a forceDownload reload in that case.

@deepthi912 deepthi912 requested a review from Jackie-Jiang June 26, 2026 07:15
@codecov-commenter

codecov-commenter commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 35.48387% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.79%. Comparing base (a9b5207) to head (8154665).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/data/manager/BaseTableDataManager.java 5.00% 16 Missing and 3 partials ⚠️
...cal/upsert/BasePartitionUpsertMetadataManager.java 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18860      +/-   ##
============================================
- Coverage     64.79%   64.79%   -0.01%     
+ Complexity     1322     1316       -6     
============================================
  Files          3393     3393              
  Lines        211265   211334      +69     
  Branches      33212    33236      +24     
============================================
+ Hits         136896   136932      +36     
- Misses        63320    63343      +23     
- Partials      11049    11059      +10     
Flag Coverage Δ
custom-integration1 ?
integration 0.00% <ø> (-100.00%) ⬇️
integration1 ?
integration2 0.00% <ø> (?)
java-21 64.79% <35.48%> (-0.01%) ⬇️
temurin 64.79% <35.48%> (-0.01%) ⬇️
unittests 64.79% <35.48%> (-0.01%) ⬇️
unittests1 56.99% <3.22%> (-0.02%) ⬇️
unittests2 37.17% <32.25%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@deepthi912 deepthi912 added upsert Related to upsert functionality dedup Changes related to realtime ingestion dedup handling bug Something is not working as expected labels Jun 26, 2026
@deepthi912 deepthi912 marked this pull request as draft June 26, 2026 17:24
…ables

Reloading a segment of an upsert table with metadata TTL (metadataTTL or
deletedKeysTTL) previously rescanned every doc and re-added every primary key,
which resurrected keys the TTL had already expired or deleted, producing
incorrect query results.

With snapshots enabled, segment reload now rebuilds the upsert metadata from the
persisted validDocIds snapshot (valid docs only):
- normal reload: proceeds only when the segment CRC matches and a snapshot
  exists; otherwise it fails with an error pointing the operator to forceDownload.
- forceDownload reload: always proceeds, using the snapshot if present (the
  operator accepts a possible CRC mismatch), otherwise a full scan.

Segment commit and refresh paths are unchanged: only the reload path places a
snapshot on the incoming segment, so they keep the regular full-scan behavior.

This supersedes the earlier approach of blocking forceDownload reload at the
controller for upsert/dedup TTL tables; that API-level check is removed.

Backward-incompatible: a normal (non-forceDownload) reload of an upsert table
with metadata TTL and snapshots enabled now fails when the segment CRC has
changed or no validDocIds snapshot exists; use a forceDownload reload in that
case.
@deepthi912 deepthi912 force-pushed the block-force-download-ttl-upsert-dedup branch from c46bf01 to 8154665 Compare June 27, 2026 07:25
@deepthi912 deepthi912 added the backward-incompat Introduces a backward-incompatible API or behavior change label Jun 27, 2026
@deepthi912 deepthi912 changed the title Donot allow force-download reload for upsert/dedup tables with when TTL is set Rebuild upsert metadata from validDocIds snapshot on reload for TTL tables Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Introduces a backward-incompatible API or behavior change bug Something is not working as expected dedup Changes related to realtime ingestion dedup handling upsert Related to upsert functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants